Goto

Collaborating Authors

 Problem-Independent Architectures


A Related Work Neural Architecture Search (NAS) was introduced to ease the process of manually designing complex

Neural Information Processing Systems

However, existing MP-NAS methods face architectural limitations. These limitations hinder MP-NAS usage in SOT A search spaces, leaving the challenge of swiftly designing effective large models unresolved. Accuracy is the result of the network training on ImageNet for 200 epochs. An accuracy prediction model that operates without FLOPs information. Table 2 illustrates the outcomes of these models.


Appendix for Multi-task Graph Neural Architecture Search with Task-aware Collaboration and Curriculum

Neural Information Processing Systems

An operation w Model weight α The architecture parameter N The number of chunks θ The trainable parameter in the soft task-collaborative module p The parameter generated by Eq.(9) p The parameter generated by Eq.(11), replacing p during curriculum training δ The parameter to control graph structure diversity γ The parameter to control task-wise curriculum training BNRist is the abbreviation of Beijing National Research Center for Information Science and Technology. Here we provide the detailed derivation process of Eq.(10). For the other datasets, we use the task-separate head. The experiment results on OGBG datasets are shown in Table 5. From the table, our method can outperform all the multi-task NAS baselines in the three datasets.



Adapting Neural Architectures Between Domains (Supplementary Material) Y anxi Li1, Zhaohui Y ang 2,3, Yunhe Wang

Neural Information Processing Systems

By combining Theorem 2 and Lemma 3, we can derive the proof of Corollary 4. Let There are 2 kinds of cells in the search space, including normal cells and reduction cells. After a reduction cell, the channel number is doubled. Cells are stacked sequentially to build a network. We use a set of 8 different candidate operations, including: 3 3 separable convolution; 5 5 separable convolution; 3 3 dilated separable convolution; 5 5 dilated separable convolution; 3 3 max pooling; 3 3 average pooling; identity (i.e. All the operations follow the ReLU-Conv/Pooling-BN pattern except identity and zero.


Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

Neural Information Processing Systems

Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penalties to prevent bias from effecting the model during training, or post-processing predictions to debias them, yet these approaches have shown limited success on hard problems such as face recognition. In our work, we discover that biases are actually inherent to neural network architectures themselves. Following this reframing, we conduct the first neural architecture search for fairness, jointly with a search for hyperparameters. Our search outputs a suite of models which Pareto-dominate all other high-performance architectures and existing bias mitigation methods in terms of accuracy and fairness, often by large margins, on the two most widely used datasets for face identification, CelebA and VGGFace2. Furthermore, these models generalize to other datasets and sensitive attributes. We release our code, models and raw data files at https://github.com/dooleys/FR-NAS.


Neural Architecture Optimization

Neural Information Processing Systems

Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space.


Towards modular and programmable architecture search

Neural Information Processing Systems

Neural architecture search methods are able to find high performance deep learning architectures with minimal effort from an expert. However, current systems focus on specific use-cases (e.g.


Convergence beyond the over-parameterized regime using Rayleigh quotients

Neural Information Processing Systems

In this paper, we present a new strategy to prove the convergence of Deep Learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-Lojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.


Learning the Geometry of Wave-Based Imaging

Neural Information Processing Systems

We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium ``bends'' the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable neural architecture inspired by Fourier integral operators (FIOs) which approximate the wave physics.


GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts

Neural Information Processing Systems

Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a referential representation w.r.t. a reference model, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure reliable optimization. GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark, which is comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on the WebKB and Twitch datasets.